AITopics

Huang, Zhenwei, Huang, Wen, Jawanpuria, Pratik, Mishra, Bamdev

Riemannian Federated Learning via Averaging Gradient Stream

arXiv.org Artificial IntelligenceSep-11-2024

In recent years, federated learning has garnered significant attention as an efficient and privacy-preserving distributed learning paradigm. In the Euclidean setting, Federated Averaging (FedAvg) and its variants are a class of efficient algorithms for expected (empirical) risk minimization. This paper develops and analyzes a Riemannian Federated Averaging Gradient Stream (RFedAGS) algorithm, which is a generalization of FedAvg, to problems defined on a Riemannian manifold. Under standard assumptions, the convergence rate of RFedAGS with fixed step sizes is proven to be sublinear for an approximate stationary solution. If decaying step sizes are used, the global convergence is established. Furthermore, assuming that the objective obeys the Riemannian Polyak-{\L}ojasiewicz property, the optimal gaps generated by RFedAGS with fixed step size are linearly decreasing up to a tiny upper bound, meanwhile, if decaying step sizes are used, then the gaps sublinearly vanish. Numerical simulations conducted on synthetic and real-world data demonstrate the performance of the proposed RFedAGS.

gradf, manifold, rfedags-k, (14 more...)

2409.07223

Country:

Asia > China > Fujian Province > Xiamen (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.81)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Li, Jiaxiang, Balasubramanian, Krishnakumar, Ma, Shiqian

Zeroth-order Riemannian Averaging Stochastic Approximation Algorithms

arXiv.org Machine LearningSep-25-2023

We present Zeroth-order Riemannian Averaging Stochastic Approximation (\texttt{Zo-RASA}) algorithms for stochastic optimization on Riemannian manifolds. We show that \texttt{Zo-RASA} achieves optimal sample complexities for generating $\epsilon$-approximation first-order stationary solutions using only one-sample or constant-order batches in each iteration. Our approach employs Riemannian moving-average stochastic gradient estimators, and a novel Riemannian-Lyapunov analysis technique for convergence analysis. We improve the algorithm's practicality by using retractions and vector transport, instead of exponential mappings and parallel transports, thereby reducing per-iteration complexity. Additionally, we introduce a novel geometric condition, satisfied by manifolds with bounded second fundamental form, which enables new error bounds for approximating parallel transport with vector transport.

artificial intelligence, machine learning, optimization problem, (18 more...)

2309.14506

Country:

North America > United States > California > Yolo County > Davis (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

arXiv.org Artificial IntelligenceAug-21-2023

Decentralized Riemannian Conjugate Gradient Method on the Stiefel Manifold

Chen, Jun, Ye, Haishan, Wang, Mengmeng, Huang, Tianxin, Dai, Guang, Tsang, Ivor W., Liu, Yong

The conjugate gradient method is a crucial first-order optimization method that generally converges faster than the steepest descent method, and its computational cost is much lower than the second-order methods. However, while various types of conjugate gradient methods have been studied in Euclidean spaces and on Riemannian manifolds, there has little study for those in distributed scenarios. This paper proposes a decentralized Riemannian conjugate gradient descent (DRCGD) method that aims at minimizing a global function over the Stiefel manifold. The optimization problem is distributed among a network of agents, where each agent is associated with a local function, and communication between agents occurs over an undirected connected graph. Since the Stiefel manifold is a non-convex set, a global function is represented as a finite sum of possibly non-convex (but smooth) local functions. The proposed method is free from expensive Riemannian geometric operations such as retractions, exponential maps, and vector transports, thereby reducing the computational complexity required by each agent. To the best of our knowledge, DRCGD is the first decentralized Riemannian conjugate gradient algorithm to achieve global convergence over the Stiefel manifold.

artificial intelligence, machine learning, optimization problem, (17 more...)

2308.10547

Country:

Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Europe > Russia (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

arXiv.org Artificial IntelligenceAug-16-2023

Warped geometric information on the optimisation of Euclidean functions

Hartmann, Marcelo, Williams, Bernardo, Yu, Hanlin, Girolami, Mark, Barp, Alessandro, Klami, Arto

We consider the fundamental task of optimizing a real-valued function defined in a potentially high-dimensional Euclidean space, such as the loss function in many machine-learning tasks or the logarithm of the probability distribution in statistical inference. We use the warped Riemannian geometry notions to redefine the optimisation problem of a function on Euclidean space to a Riemannian manifold with a warped metric, and then find the function's optimum along this manifold. The warped metric chosen for the search domain induces a computational friendly metric-tensor for which optimal search directions associate with geodesic curves on the manifold becomes easier to compute. Performing optimization along geodesics is known to be generally infeasible, yet we show that in this specific manifold we can analytically derive Taylor approximations up to third-order. In general these approximations to the geodesic curve will not lie on the manifold, however we construct suitable retraction maps to pull them back onto the manifold. Therefore, we can efficiently optimize along the approximate geodesic curves. We cover the related theory, describe a practical optimization algorithm and empirically evaluate it on a collection of challenging optimisation benchmarks. Our proposed algorithm, using third-order approximation of geodesics, outperforms standard Euclidean gradient-based counterparts in term of number of iterations until convergence and an alternative method for Hessian-based optimisation routines.

approximation, artificial intelligence, machine learning, (19 more...)

2308.08305

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Finland > Uusimaa > Helsinki (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

arXiv.org Artificial IntelligenceAug-13-2022

Riemannian accelerated gradient methods via extrapolation

Han, Andi, Mishra, Bamdev, Jawanpuria, Pratik, Gao, Junbin

Optimization on a Riemannian manifold naturally appears in various fields of applications, including principal component analysis [22, 61], matrix completion and factorization [35, 56, 13], dictionary learning [17, 27], optimal transport [49, 40, 26], to name a few. Riemannian optimization [2, 12] provides a universal and efficient framework for problem (1) that respects the intrinsic geometry of the constraint set. In addition, many non-convex problems turns out to be geodesic convex (a generalized notion of convexity) on the manifold, which yields better convergence guarantees for Riemannian optimization methods. One of the most fundamental solvers is the Riemannian gradient descent method [55, 62, 2, 12], which generalizes the classical gradient descent method in the Euclidean space with intrinsic updates on manifolds. There also exist various advanced algorithms for Riemannian optimization that include stochastic and variance reduced methods [11, 61, 34, 24, 25], adaptive gradient methods [8, 33] quasi-Newton methods [30, 43], trust region methods [1], and cubic regularized Newton methods [3], among others. Nevertheless, it remains unclear whether there exists a simple strategy to accelerate firstorder algorithms on Riemannian manifolds. Existing research on accelerated gradient methods focus primarily on generalizing Nesterov acceleration [42] to Riemannian manifolds, including [37, 4, 63, 6, 31, 36]. However, most of the algorithms are theoretic constructs and are usually less favourable in practice.

exp 1, manifold, retr 1, (15 more...)

2208.06619

Country:

Asia > Middle East > Jordan (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > India (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

arXiv.org Machine LearningAug-24-2021

Vector Transport Free Riemannian LBFGS for Optimization on Symmetric Positive Definite Matrix Manifolds

Godaz, Reza, Ghojogh, Benyamin, Hosseini, Reshad, Monsefi, Reza, Karray, Fakhri, Crowley, Mark

This work concentrates on optimization on Riemannian manifolds. The Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm is a commonly used quasi-Newton method for numerical optimization in Euclidean spaces. Riemannian LBFGS (RLBFGS) is an extension of this method to Riemannian manifolds. RLBFGS involves computationally expensive vector transports as well as unfolding recursions using adjoint vector transports. In this article, we propose two mappings in the tangent space using the inverse second root and Cholesky decomposition. These mappings make both vector transport and adjoint vector transport identity and therefore isometric. Identity vector transport makes RLBFGS less computationally expensive and its isometry is also very useful in convergence analysis of RLBFGS. Moreover, under the proposed mappings, the Riemannian metric reduces to Euclidean inner product, which is much less computationally expensive. We focus on the Symmetric Positive Definite (SPD) manifolds which are beneficial in various fields such as data science and statistics. This work opens a research opportunity for extension of the proposed mappings to other well-known manifolds.

mapping, vector transport, vtf, (17 more...)

2108.11019

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

arXiv.org Machine LearningFeb-3-2020

Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform

Li, Jun, Fuxin, Li, Todorovic, Sinisa

Strictly enforcing orthonormality constraints on parameter matrices has been shown advantageous in deep learning. This amounts to Riemannian optimization on the Stiefel manifold, which, however, is computationally expensive. To address this challenge, we present two main contributions: (1) A new efficient retraction map based on an iterative Cayley transform for optimization updates, and (2) An implicit vector transport mechanism based on the combination of a projection of the momentum and the Cayley transform on the Stiefel manifold. We specify two new optimization algorithms: Cayley SGD with momentum, and Cayley ADAM on the Stiefel manifold. Convergence of Cayley SGD is theoretically analyzed. Our experiments for CNN training demonstrate that both algorithms: (a) Use less running time per iteration relative to existing approaches that enforce orthonormality of CNN parameters; and (b) Achieve faster convergence rates than the baseline SGD and ADAM algorithms without compromising the performance of the CNN. Cayley SGD and Cayley ADAM are also shown to reduce the training time for optimizing the unitary transition matrices in RNNs.

cayley sgd, cayley transform, stiefel manifold, (16 more...)

2002.01113

Country:

North America > United States > Oregon > Benton County > Corvallis (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Kasai, Hiroyuki, Sato, Hiroyuki, Mishra, Bamdev

Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis

arXiv.org Machine LearningSep-16-2017

Stochastic variance reduction algorithms have recently become popular for minimizing the average of a large, but finite number of loss functions. The present paper proposes a Riemannian stochastic quasi-Newton algorithm with variance reduction (R-SQN-VR). The key challenges of averaging, adding, and subtracting multiple gradients are addressed with notions of retraction and vector transport. We present convergence analyses of R-SQN-VR on both non-convex and retraction-convex functions under retraction and vector transport operators. The proposed algorithm is evaluated on the Karcher mean computation on the symmetric positive-definite manifold and the low-rank matrix completion on the Grassmann manifold. In all cases, the proposed algorithm outperforms the state-of-the-art Riemannian batch and stochastic gradient algorithms.

artificial intelligence, machine learning, optimization problem, (19 more...)

1703.0489

Country: Asia > Japan > Honshū (0.28)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Sato, Hiroyuki, Kasai, Hiroyuki, Mishra, Bamdev

Riemannian stochastic variance reduced gradient

arXiv.org Machine LearningApr-10-2017

Stochastic variance reduction algorithms have recently become popular for minimizing the average of a large but finite number of loss functions. In this paper, we propose a novel Riemannian extension of the Euclidean stochastic variance reduced gradient algorithm (R-SVRG) to a manifold search space. The key challenges of averaging, adding, and subtracting multiple gradients are addressed with retraction and vector transport. We present a global convergence analysis of the proposed algorithm with a decay step size and a local convergence rate analysis under a fixed step size under some natural assumptions. The proposed algorithm is applied to problems on the Grassmann manifold, such as principal component analysis, low-rank matrix completion, and computation of the Karcher mean of subspaces, and outperforms the standard Riemannian stochastic gradient descent algorithm in each case.

artificial intelligence, machine learning, r-svrg, (18 more...)

1702.05594

Country: Asia > Japan (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)